A Tufte Handout On Hypothesis Testing and Confidence Intervals for the Mean of Quantities

Hypothesis Testing and Confidence Intervals

Robert W. Walker

2023-02-24

cars data

I will work with R’s internal dataset on cars: cars. There are two variables in the dataset, speed [speed] and distance [dist] this is what they look like.

library(tidyverse) GGP1 <- ggplot(cars) + aes(x=speed, y=dist) + geom_point() + theme_minimal() + labs(x=“Speed”, y=“Stopping Distance”) library(plotly) ggplotly(GGP1)

Scatterplot of cars data

Hypothesis tests require two things:

  1. A claim based on target/hypothesized value
    • Null equal and alternative two-sided [not equal]: \(\mu = target\) vs. \(\mu \neq target\)
    • Null less than or equal and alternative greater: \(\mu \leq target\) vs. \(\mu > target\)
    • Null greater than or equal and alternative lesser: \(\mu \geq target\) vs. \(\mu < target\)
  2. A confidence level that is a threshold in probability for rendering a true/false answer.

Hypothesis Testing and \(t\)

I will work with the speed variable. On a basic level, a hypothesis is a specific value for the average here. For whatever reason, \(\mu=17\) motivates us. I consider all three distinct hypotheses though this is not entirely proper in the sense that any given application of hypothesis testing should have a clear hypothesis about the variable of interest rather than waffling amongst all of them as I will. I do so only for illustrative purposes. To begin, I need to specify both the hypothesis and the confidence level that I intend to use to evaluate it.

Before getting to equations, something is important to note. The mean of the data, \(\overline{x}\), \(s\), the standard deviation, and \(n\) are all known from the data. There are then only two remaining unknowns, either \(t\) or \(\mu\).

\[t=\frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}}\]

The \(t\) equation is given by:

\[\mu = \overline{x} - t\left(\frac{s}{\sqrt{n}}\right)\]

One further algebraic manipulation before starting. Let’s solve for \(\mu\).

This fully defines the parts we require and the core problem because the data essentially gives us almost all of the unknowns. In some very basic way, conditional on data, only \(\mu\) or \(t\) from a given probability remain unknown.

Details of the Data

library(ResampleProps)
library(gt)
library(gtExtras)
Resampled.Means <- data.frame(Mean.Speed = ResampleMean(cars$speed, k=10000))
library(tidyverse)
Tab.Res <- cars %>% summarise(Mean = mean(speed),
                   SD = round(sd(speed), digits=3),
                   SE = round(sd(speed)/sqrt(n()), digits=3),
                   N = n(),
                   P05 = round(quantile(Resampled.Means$Mean.Speed, probs=0.05), digits=3),
P95 = round(quantile(Resampled.Means$Mean.Speed, probs=0.95), digits=3),
P10 = round(quantile(Resampled.Means$Mean.Speed, probs=0.1), digits=3),
P90 = round(quantile(Resampled.Means$Mean.Speed, probs=0.9), digits=3)
)
Table.Res <- Tab.Res %>% gt() %>%  
  tab_spanner(
    label = "Percentile of Simulated Means",
    columns = c(P05,P95,P10,P90)) %>%
  gt_theme_nytimes() %>% 
  tab_header(title = "Speed from cars: The Distribution of the Average")
Q05 <- quantile(Resampled.Means$Mean.Speed, probs=0.05)
Q95 <- quantile(Resampled.Means$Mean.Speed, probs=0.95)
Q1 <- quantile(Resampled.Means$Mean.Speed, probs=0.1)
Q9 <- quantile(Resampled.Means$Mean.Speed, probs=0.9)
Resampled.Means %>% ggplot() + 
  aes(x=Mean.Speed) + 
  geom_density() + 
  geom_vline(aes(xintercept=Q05), color="red") +
  geom_vline(aes(xintercept=Q95), color="red") +
  theme_minimal() + 
  labs(x="Resampled Average", title="Resampled Means", subtitle="Middle 90% in red")

The statistics of it all

\[t = \frac{15.4 - 17}{\frac{s}{\sqrt{n}}} = \frac{-1.6}{0.7478}=-2.14\]

There are 2.14 standard errors between the mean that we obtain from the data and the hypothetical mean of 17; the value from the data is smaller, hence the negative.

The middle 90% of the simulated means range from 14.2 to 16.6.

Table.Res
Speed from cars: The Distribution of the Average
Mean SD SE N Percentile of Simulated Means
P05 P95 P10 P90
15.4 5.288 0.748 50 14.2 16.6 14.46 16.34

Case 1: A Two-sided Alternative

library(radiant)
result <- prob_tdist(df = 49, plb = 0.05, pub=0.95)
# summary(result, type = "probs")
p1 <- plot(result, type = "probs") + 
  theme_minimal() +
  geom_label(aes(x=-1.677, y=0.25, label="t=-1.677")) +
  geom_label(aes(x=1.677, y=0.25, label="t=1.677")) +
  geom_label(aes(x=0, y=0.15, label="p=0.9"), size=2) +
  geom_label(aes(x=-3, y=0.075, label="p=0.05"), size=2) +
  geom_label(aes(x=3, y=0.075, label="p=0.05"), size=2) +
  labs(title="Critical Values", x="t")  +
  xlim(-4,4)
result2 <- prob_tdist(df = 49, lb = -2.14, ub=2.14)
p2 <- plot(result2) + theme_minimal() +
  geom_vline(aes(xintercept=-2.14), color="#099587") +
  geom_vline(aes(xintercept=2.14), color="#099587")  +
  geom_label(aes(x=-2.14, y=0.25, label="t=-2.14")) +
  geom_label(aes(x=2.14, y=0.25, label="t=2.14")) +
  geom_label(aes(x=0, y=0.15, label="p=0.963"), size=2) +
  geom_label(aes(x=-3, y=0.075, label="p=0.0187"), size=2) +
  geom_label(aes(x=3, y=0.075, label="p=0.0187"), size=2) +
  guides(size="none") +
  labs(title="Result", x="t") + 
  xlim(-4,4)
library(patchwork)
p1 + p2

Critical values and p-values

Critical Values
P(-1.677 < X < 1.677)     = 0.9
1 - P(-1.677 < X < 1.677) = 0.1

Result: The p-value
P(X < -2.14) = 0.019
P(X > 2.14) = 0.019
P(-2.14 < X < 2.14)     = 0.963
1 - P(-2.14 < X < 2.14) = 0.037

t.test

Alternative: Two-sided

t.test(cars$speed, mu=17, conf.level = 0.9)
## 
##  One Sample t-test
## 
## data:  cars$speed
## t = -2.1397, df = 49, p-value = 0.03739
## alternative hypothesis: true mean is not equal to 17
## 90 percent confidence interval:
##  14.1463 16.6537
## sample estimates:
## mean of x 
##      15.4

radiant

radiant

Alternative: Two-sided

result <- single_mean( cars, var = “speed”, comp_value = 17, conf_lev = 0.9 ) summary(result) # plot(result, plots = “hist”, custom = FALSE)

Analytics

Knowing only the sample size, 50, is sufficient to determine what \(t\) must be to reject a mean of 17 in favor of the alternative that the true mean is not 17. With 0.9 probability, this implies two boundaries; either the true mean is smaller than 17, with 0.05 probability spanning 0 to 0.05 and it is bigger than 17 with 0.05 probability spanning 0.95 to 1 so that the interior range represents 0.9 probability as required.

  • Critical Values: The sample mean would have to be at least \(\pm\) 1.677 standard errors away from 17 to rule out a mean of 17 with 90% confidence.

  • Result: Our sample mean is 2.14 standard errors away from 17. The probability of something equally or more extreme is 0.037. This is known as the p-value..

Case 2: A Lesser Alternative

library(radiant)
result <- prob_tdist(df = 49, plb = 0.1)
p1a <- plot(result, type = "probs") + theme_minimal() + geom_label(aes(x=-1.299, y=0.25, label="t=-1.299")) +
  geom_label(aes(x=0, y=0.075, label="p=0.9"), size=2) +
  geom_label(aes(x=-3, y=0.075, label="p=0.1"), size=2) +
  labs(title="Critical Value") +
  xlim(-4,4)
result2 <- prob_tdist(df = 49, lb = -2.14)
p2a <- plot(result2) + theme_minimal() +
  geom_vline(aes(xintercept=-2.14), color="#099587") +
  geom_label(aes(x=-2.25, y=0.25, label="t=-2.14")) +
  geom_label(aes(x=0, y=0.075, label="p=0.981"), size=2) +
  geom_label(aes(x=-3, y=0.075, label="p=0.0187"), size=2) +
  guides(size="none") +
  labs(title="Result", x="t") + 
  xlim(-4,4)
library(patchwork)
p1a + p2a

Critical values and p-values

Critical Values
P(X < -1.299) = 0.1
P(X > -1.299) = 0.9

Result: The p-value
P(X < -2.14) = 0.019
P(X > -2.14) = 0.981

t.test

Alternative: Lesser

t.test(cars$speed, mu=17, conf.level = 0.9, alternative="less")
## 
##  One Sample t-test
## 
## data:  cars$speed
## t = -2.1397, df = 49, p-value = 0.01869
## alternative hypothesis: true mean is less than 17
## 90 percent confidence interval:
##      -Inf 16.37143
## sample estimates:
## mean of x 
##      15.4

radiant

Alternative: Less

result <- single_mean( cars, var = “speed”, comp_value = 17, alternative = “less”, conf_lev = 0.9 ) summary(result) # plot(result, plots = “hist”, custom = FALSE)

Analytics

Knowing only the sample size, 50, is sufficient to determine what \(t\) must be to reject a mean of 17 or greater in favor of the alternative that the true mean is less than 17. With 0.9 probability, either the true mean is smaller than 17, with 0.1 probability or it is 17 or bigger with 0.9 probability as required.

  • Critical Values: The sample mean would have to be at most -1.299 standard errors away from 17 to rule out a mean of 17 or greater with 90% confidence.

  • Result: Our sample mean is -2.14 standard errors away from 17. The probability of something equal or lesser is 0.019. This is known as the p-value.

Case 3: A Greater Alternative

result <- prob_tdist(df = 49, plb = 0.9)
p1a <- plot(result, type = "probs") + theme_minimal() + geom_label(aes(x=1.299, y=0.25, label="t=1.299")) +
  geom_label(aes(x=0, y=0.075, label="p=0.9"), size=2) +
  geom_label(aes(x=3, y=0.075, label="p=0.1"), size=2) +
  labs(title="Critical Value") +
  xlim(-4,4)
result2 <- prob_tdist(df = 49, lb = -2.14)
p2a <- plot(result2) + theme_minimal() +
  geom_vline(aes(xintercept=-2.14), color="#099587") +
  geom_label(aes(x=-2.25, y=0.25, label="t=-2.14")) +
  geom_label(aes(x=0, y=0.075, label="p=0.981"), size=2) +
  geom_label(aes(x=-3, y=0.075, label="p=0.0187"), size=2) +
  guides(size="none") +
  labs(title="Result", x="t") + 
  xlim(-4,4)
library(patchwork)
p1a + p2a

Critical values and p-values

Critical Values
P(X < -1.299) = 0.1
P(X > -1.299) = 0.9

Result: The p-value
P(X < -2.14) = 0.019
P(X > -2.14) = 0.981

t.test

Alternative: Greater

t.test(cars$speed, mu=17, conf.level = 0.9, alternative="greater")
## 
##  One Sample t-test
## 
## data:  cars$speed
## t = -2.1397, df = 49, p-value = 0.9813
## alternative hypothesis: true mean is greater than 17
## 90 percent confidence interval:
##  14.42857      Inf
## sample estimates:
## mean of x 
##      15.4

radiant

Alternative: Greater

result <- single_mean(
  cars, 
  var = "speed", 
  comp_value = 17, 
  alternative = "greater", 
  conf_lev = 0.9
)
summary(result)
## Single mean test
## Data      : cars 
## Variable  : speed 
## Confidence: 0.9 
## Null hyp. : the mean of speed = 17 
## Alt. hyp. : the mean of speed is > 17 
## 
##    mean  n n_missing    sd    se    me
##  15.400 50         0 5.288 0.748 1.254
## 
##  diff    se t.value p.value df    10% 100%  
##  -1.6 0.748   -2.14   0.981 49 14.429  Inf  
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot(result, plots = "hist", custom = FALSE)

Analytics

Knowing only the sample size, 50, is sufficient to determine what \(t\) must be to reject a mean of 17 or less in favor of the alternative that the true mean is greater than 17. With 0.9 probability, the true mean is bigger than 17 and with 0.1 probability, it is smaller as required.

  • Critical Values: The sample mean would have to be at most 1.299 standard errors away from 17 to rule out a mean of 17 or less with 90% confidence.

  • Result: Our sample mean is -2.14 standard errors away from 17. The probability of something equal or greater is 0.981. This is known as the p-value.

Equations

First, doing the math by hand, I get:

\[ t = \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} = \frac{15.4 - 17}{\frac{5.29}{\sqrt{50}}} = -2.14 \]

Interpreting the result, the sample mean is 2.14 standard errors below the hypothetical mean of 17. Any \(t\) less than -1.299 would have been sufficient to reject the claim that \(\mu \geq 17\) and conclude that it must be less than 17 with 90% confidence.

I could also use the second manipulation above to study this in the original metric. We would accept [or fail to reject] the claim that \(\mu \geq 17\) if \(t \geq -1.299\). Let’s plug this in.

\[t \geq \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} \rightarrow \mu + t\left(\frac{s}{\sqrt{n}}\right) \geq \overline{x} \rightarrow 17 + (-1.299)\left(\frac{5.29}{\sqrt{50}}\right) \geq \overline{x} \rightarrow 16.03 \geq \overline{x}\]

Put in words, we could sustain the belief that \(\mu \geq 17\) with 90% confidence as long as we see a sample mean of 16.03 or greater given this standard deviation and sample size. Ours is 15.4 so we must reject the claim that \(\mu \geq 17\) with 90% confidence.

plot(17 + 0.7478*seq(-5,5, by=0.01), dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste(mu)), main="measured in std. errors of the mean", ylab="Density", type="l")

A p-value

The probability of a sample mean of 15.4 [or smaller] given a true average of 17, this standard deviation and sample size is pt(-2.14, 49) = 0.0186798. This is referred to as a p-value. Notice that this probability is less than 0.1; thus with at least 90% confidence, the true mean is not 17 or greater and thus must be smaller because this probability is less than 10% or 0.1. A p-value less than 100 minus the confidence level [expressed as a percentage] or 1 minus the confidence level [expressed as a probability] is sufficient to reject the hypothesis.

Giving an interpretation of this, assuming the hypothetical mean [17 or greater] is true, the likelihood of generating a sample mean of 15.4 is only 0.0187 and this is far less than the 10% permissible outside of 90% confidence. Indeed, any sample mean more than 1.299 standard errors below 17 would be too small to sustain the belief that the true mean is 17 or greater because qt(0.1, 49) is -1.299. Put in the original metric, any sample mean below 16.0285747 would require a rejection of the claim that the true mean is 17 or greater with 90% confidence.

The Confidence Interval

The confidence interval is always centered on the sample mean. Rearranging the equation above and solving for \(\mu\) given the \(t\) above, we get

\[ \mu = \overline{x} - t(\frac{s}{\sqrt{n}}) = 15.4 - (-1.299*\frac{5.29}{\sqrt{50}}) = 16.37143 \]

With 90% confidence, given this sample mean, the true value should be less than 16.37143.

The native t.test

t.test(cars$speed, conf.level = 0.9, alternative = "less", mu=17)
## 
##  One Sample t-test
## 
## data:  cars$speed
## t = -2.1397, df = 49, p-value = 0.01869
## alternative hypothesis: true mean is less than 17
## 90 percent confidence interval:
##      -Inf 16.37143
## sample estimates:
## mean of x 
##      15.4

Simplifying?

\[ t(\frac{s}{\sqrt{n}}) = \overline{x} - \mu \] can lead to either:

\[ \overline{x} - t(\frac{s}{\sqrt{n}}) = \mu \]

or

\[ \overline{x} = \mu + t(\frac{s}{\sqrt{n}}) \]

So a minus \(t\) will be below \(\mu\) but above \(\overline{x}\) and a positive \(t\) will be above \(\mu\) but below \(\overline{x}\).
1. An hypothesis test given \(\mu\) with an alternative that is less must then render an upper bound given \(\overline{x}\).
2. An hypothesis test given \(\mu\) with an alternative that is greater must then render a lower bound given \(\overline{x}\).

A graphical representation

Given a sample size \(n\), some unknown constant \(\mu\) and satisfaction of Lindeberg’s condition, the sampling distribution of the sample mean follows a \(t\) distribution with degrees of freedom \(n-1\). To render a graphical representation, let’s arbitrarily set n to 50, as in the above example. Here is a plot.

plot(seq(-5,5, by=0.01), dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste("x-bar -",mu," (measured in std. errors of the mean)", sep="")), ylab="Density", type="l")

Inverting the scale transformation

We can now reverse the scale by the standard error of the mean. In the above example, it is 0.7478. Measured in miles per hour, we obtain:

plot(seq(-5,5, by=0.01)*0.7478, dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste("x-bar -",mu," (measured in mph)", sep="")), ylab="Density", type="l")

Now we will take the concrete example above.

The Hypothesis Test

We claim that the true mean is 17 or greater. Now we need center the distribution above as though the claim is true.

plot(x=17+seq(-5,5, by=0.01)*0.7478, dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste(mu," (measured in mph)", sep="")), ylab="Density", type="l")
abline(v=17, col="red")
polygon(x = c(17+seq(0,5, by=0.01)*0.7478, 21), y = c(0, dt(seq(0,5, by=0.01), df=49)), col = "red")

The sample mean is estimated to be 15.4. How likely is that?

plot(x=17+seq(-5,5, by=0.01)*0.7478, dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste("x-bar -",mu," (measured in mph)", sep="")), ylab="Density", type="l")
abline(v=17, col="red")
abline(v=15.4, col="blue")
polygon(x = c(17+seq(0,5, by=0.01)*0.7478, 21), y = c(0, dt(seq(0,5, by=0.01), df=49)), col = "red")
polygon(x = c(12, 17+seq(-5,-2.14, by=0.01)*0.7478), y = c(dt(seq(-5,-2.14, by=0.01), df=49), 0), col = "blue")
abline(h=0, col="black")
abline(v=17 + qt(0.1, df=49)*0.7874, col="black", lty=3)

The probability of seeing such a small sample mean if the true average is 17 is only 0.01869. The probability above the dotted black line is 0.9 with 0.1 below. WIth 90% confidence, anything below this would be sufficient evidence to reject the claim that the true average is 17 or above.

The Confidence Interval

Let’s take the sample mean as the center and work out a confidence interval at 90%. It’s exactly the 16.37143 gives above.

plot(x=15.4+seq(-5,5, by=0.01)*0.7478, dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste(mu," | x-bar (measured in mph)", sep="")), ylab="Density", type="l")
abline(v=15.4, col="blue")
abline(v=15.4 - qt(0.1, df=49)*0.7478, col="black", lty=3)
polygon(x = c(11, 15.4+seq(-5,1.3, by=0.01)*0.7478), y = c(dt(seq(-5,1.3, by=0.01), df=49), 0), col = "blue")

As an aside, 17 has exactly 0.01869 probability above it shown in orange.

plot(x=15.4+seq(-5,5, by=0.01)*0.7478, dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste(mu," | x-bar (measured in mph)", sep="")), ylab="Density", type="l")
abline(v=15.4, col="blue")
abline(v=15.4 - qt(0.1, df=49)*0.7478, col="black", lty=3)
polygon(x = c(11, 15.4+seq(-5,1.3, by=0.01)*0.7478), y = c(dt(seq(-5,1.3, by=0.01), df=49), 0), col = "blue")
polygon(x = c(15.4+seq(2.14,5, by=0.01)*0.7478, 17), y = c(dt(seq(2.14,5, by=0.01), df=49), 0), col = "orange")

Headings

This style provides first and second-level headings (that is, # and ##), demonstrated in the next section. You may get unexpected output if you try to use ### and smaller headings.

In his later books1 Beautiful Evidence, Tufte starts each section with a bit of vertical space, a non-indented paragraph, and sets the first few words of the sentence in small caps. To accomplish this using this style, call the newthought() function in tufte in an inline R expression `r ` as demonstrated at the beginning of this paragraph.2 Note you should not assume tufte has been attached to your R session. You should either library(tufte) in your R Markdown document before you call newthought(), or use tufte::newthought().

Figures

Margin Figures

Images and graphics play an integral role in Tufte’s work. To place figures in the margin you can use the knitr chunk option fig.margin = TRUE. For example:

MPG vs horsepower, colored by transmission. MPG vs horsepower, colored by transmission.

library(ggplot2)
mtcars2 <- mtcars
mtcars2$am <- factor(
  mtcars$am, labels = c('automatic', 'manual')
)
ggplot(mtcars2, aes(hp, mpg, color = am)) +
  geom_point() + geom_smooth() +
  theme(legend.position = 'bottom')

Note the use of the fig.cap chunk option to provide a figure caption. You can adjust the proportions of figures using the fig.width and fig.height chunk options. These are specified in inches, and will be automatically scaled down to fit within the handout margin.

Arbitrary Margin Content

In fact, you can include anything in the margin using the knitr engine named marginfigure. Unlike R code chunks ```{r}, you write a chunk starting with ```{marginfigure} instead, then put the content in the chunk. See an example on the right about the first fundamental theorem of calculus.

We know from the first fundamental theorem of calculus that for \(x\) in \([a, b]\): \[\frac{d}{dx}\left( \int_{a}^{x} f(u)\,du\right)=f(x).\]

For the sake of portability between LaTeX and HTML, you should keep the margin content as simple as possible (syntax-wise) in the marginefigure blocks. You may use simple Markdown syntax like **bold** and _italic_ text, but please refrain from using footnotes, citations, or block-level elements (e.g. blockquotes and lists) there.

Note: if you set echo = FALSE in your global chunk options, you will have to add echo = TRUE to the chunk to display a margin figure, for example ```{marginfigure, echo = TRUE}.

Full Width Figures

You can arrange for figures to span across the entire page by using the chunk option fig.fullwidth = TRUE.

ggplot(diamonds, aes(carat, price)) + geom_smooth() +
  facet_grid(~ cut)
A full width figure.

A full width figure.

Other chunk options related to figures can still be used, such as fig.width, fig.cap, out.width, and so on. For full width figures, usually fig.width is large and fig.height is small. In the above example, the plot size is \(10 \times 2\).

Arbitrary Full Width Content

Any content can span to the full width of the page. This feature requires Pandoc 2.0 or above. All you need is to put your content in a fenced Div with the class fullwidth, e.g.,

::: {.fullwidth}
Any _full width_ content here.
:::

Below is an example:

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see https://www.gnu.org/licenses/.

Main Column Figures

Besides margin and full width figures, you can of course also include figures constrained to the main column. This is the default type of figures in the LaTeX/HTML output.

ggplot(diamonds, aes(cut, price)) + geom_boxplot()

A figure in the main column.

A figure in the main column.

Sidenotes

One of the most prominent and distinctive features of this style is the extensive use of sidenotes. There is a wide margin to provide ample room for sidenotes and small figures. Any use of a footnote will automatically be converted to a sidenote. 3 This is a sidenote that was entered using a footnote.

If you’d like to place ancillary information in the margin without the sidenote mark (the superscript number), you can use the margin_note() function from tufte in an inline R expression. This is a margin note. Notice that there is no number preceding the note. This function does not process the text with Pandoc, so Markdown syntax will not work here. If you need to write anything in Markdown syntax, please use the marginfigure block described previously.

References

References can be displayed as margin notes for HTML output. For example, we can cite R here (R Core Team 2022R Core Team. 2022. R: A Language and Environment for Statistical). To enable this feature, you must set link-citations: yes in the YAML metadata, and the version of pandoc-citeproc should be at least 0.7.2. You can always install your own version of Pandoc from https://pandoc.org/installing.html if the version is not sufficient. To check the version of pandoc-citeproc in your system, you may run this in R:

system2('pandoc-citeproc', '--version')

If your version of pandoc-citeproc is too low, or you did not set link-citations: yes in YAML, references in the HTML output will be placed at the end of the output document.

Tables

You can use the kable() function from the knitr package to format tables that integrate well with the rest of the Tufte handout style. The table captions are placed in the margin like figures in the HTML output.

knitr::kable(
  mtcars[1:6, 1:6], caption = 'A subset of mtcars.'
)

A subset of mtcars.

mpg cyl disp hp drat wt
Mazda RX4 21.0 6 160 110 3.90 2.620
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
Datsun 710 22.8 4 108 93 3.85 2.320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440
Valiant 18.1 6 225 105 2.76 3.460

Block Quotes

We know from the Markdown syntax that paragraphs that start with > are converted to block quotes. If you want to add a right-aligned footer for the quote, you may use the function quote_footer() from tufte in an inline R expression. Here is an example:

“If it weren’t for my lawyer, I’d still be in prison. It went a lot faster with two people digging.”

— Joe Martin

Without using quote_footer(), it looks like this (the second line is just a normal paragraph):

“Great people talk about ideas, average people talk about things, and small people talk about wine.”

— Fran Lebowitz

Responsiveness

The HTML page is responsive in the sense that when the page width is smaller than 760px, sidenotes and margin notes will be hidden by default. For sidenotes, you can click their numbers (the superscripts) to toggle their visibility. For margin notes, you may click the circled plus signs to toggle visibility.

More Examples

The rest of this document consists of a few test cases to make sure everything still works well in slightly more complicated scenarios. First we generate two plots in one figure environment with the chunk option fig.show = 'hold':

p <- ggplot(mtcars2, aes(hp, mpg, color = am)) +
  geom_point()
p
p + geom_smooth()

Two plots in one figure environment.

Two plots in one figure environment.Two plots in one figure environment.

Then two plots in separate figure environments (the code is identical to the previous code chunk, but the chunk option is the default fig.show = 'asis' now):

p <- ggplot(mtcars2, aes(hp, mpg, color = am)) +
  geom_point()
p

Two plots in separate figure environments (the first plot).

Two plots in separate figure environments (the first plot).
p + geom_smooth()

Two plots in separate figure environments (the second plot).

Two plots in separate figure environments (the second plot).

You may have noticed that the two figures have different captions, and that is because we used a character vector of length 2 for the chunk option fig.cap (something like fig.cap = c('first plot', 'second plot')).

Next we show multiple plots in margin figures. Similarly, two plots in the same figure environment in the margin:

Two plots in one figure environment in the margin.Two plots in one figure environment in the margin. Two plots in one figure environment in the margin.

p
p + geom_smooth(method = 'lm')

Then two plots from the same code chunk placed in different figure environments:

knitr::kable(head(iris, 15))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa

Two plots in separate figure environments in the margin (the first plot). Two plots in separate figure environments in the margin (the first plot).

p
knitr::kable(head(iris, 12))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa

Two plots in separate figure environments in the margin (the second plot). Two plots in separate figure environments in the margin (the second plot).

p + geom_smooth(method = 'lm')
knitr::kable(head(iris, 5))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa

We blended some tables in the above code chunk only as placeholders to make sure there is enough vertical space among the margin figures, otherwise they will be stacked tightly together. For a practical document, you should not insert too many margin figures consecutively and make the margin crowded.

You do not have to assign captions to figures. We show three figures with no captions below in the margin, in the main column, and in full width, respectively.

# a boxplot of weight vs transmission; this figure
# will be placed in the margin
ggplot(mtcars2, aes(am, wt)) + geom_boxplot() +
  coord_flip()
# a figure in the main column
p <- ggplot(mtcars, aes(wt, hp)) + geom_point()
p

# a fullwidth figure
p + geom_smooth(method = 'lm') + facet_grid(~ gear)

Some Notes on Tufte CSS

There are a few other things in Tufte CSS that we have not mentioned so far. If you prefer sans-serif fonts, use the function sans_serif() in tufte. For epigraphs, you may use a pair of underscores to make the paragraph italic in a block quote, e.g.

I can win an argument on any topic, against any opponent. People know this, and steer clear of me at parties. Often, as a sign of their great respect, they don’t even invite me.

— Dave Barry

We hope you will enjoy the simplicity of R Markdown and this R package, and we sincerely thank the authors of the Tufte-CSS and Tufte-LaTeX projects for developing the beautiful CSS and LaTeX classes. Our tufte package would not have been possible without their heavy lifting.

You can turn on/off some features of the Tufte style in HTML output. The default features enabled are:

output:
  tufte::tufte_html:
    tufte_features: ["fonts", "background", "italics"]

If you do not want the page background to be lightyellow, you can remove background from tufte_features. You can also customize the style of the HTML page via a CSS file. For example, if you do not want the subtitle to be italic, you can define

h3.subtitle em {
  font-style: normal;
}

in, say, a CSS file my_style.css (under the same directory of your Rmd document), and apply it to your HTML output via the css option, e.g.,

output:
  tufte::tufte_html:
    tufte_features: ["fonts", "background"]
    css: "my_style.css"

There is also a variant of the Tufte style in HTML/CSS named “Envisoned CSS”. This style can be used by specifying the argument tufte_variant = 'envisioned' in tufte_html()4 The actual Envisioned CSS was not used in the tufte package. We only changed the fonts, background color, and text color based on the default Tufte style., e.g.

output:
  tufte::tufte_html:
    tufte_variant: "envisioned"

To see the R Markdown source of this example document, you may follow this link to Github, use the wizard in RStudio IDE (File -> New File -> R Markdown -> From Template), or open the Rmd file in the package:

file.edit(
  tufte:::template_resources(
    'tufte_html', '..', 'skeleton', 'skeleton.Rmd'
  )
)

This document is also available in Chinese, and its envisioned style can be found here.